首页> 外文OA文献 >Stochastic Training of Neural Networks via Successive Convex Approximations
【2h】

Stochastic Training of Neural Networks via Successive Convex Approximations

机译:基于连续凸的神经网络随机训练   约稿

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper proposes a new family of algorithms for training neural networks(NNs). These are based on recent developments in the field of non-convexoptimization, going under the general name of successive convex approximation(SCA) techniques. The basic idea is to iteratively replace the original(non-convex, highly dimensional) learning problem with a sequence of (stronglyconvex) approximations, which are both accurate and simple to optimize.Differently from similar ideas (e.g., quasi-Newton algorithms), theapproximations can be constructed using only first-order information of theneural network function, in a stochastic fashion, while exploiting the overallstructure of the learning problem for a faster convergence. We discuss severaluse cases, based on different choices for the loss function (e.g., squared lossand cross-entropy loss), and for the regularization of the NN's weights. Weexperiment on several medium-sized benchmark problems, and on a large-scaledataset involving simulated physical data. The results show how the algorithmoutperforms state-of-the-art techniques, providing faster convergence to abetter minimum. Additionally, we show how the algorithm can be easilyparallelized over multiple computational units without hindering itsperformance. In particular, each computational unit can optimize a tailoredsurrogate function defined on a randomly assigned subset of the inputvariables, whose dimension can be selected depending entirely on the availablecomputational power.
机译:本文提出了一种用于训练神经网络的新算法。这些是基于非凸优化领域的最新发展,并以连续凸逼近(SCA)技术的通用名称命名。基本思想是用一系列(强凸)近似迭代地替换原始的(非凸,高维)学习问题,这些序列既准确又易于优化。与类似的思想(例如,拟牛顿算法)不同,可以仅使用神经网络功能的一阶信息以随机方式构造近似值,同时利用学习问题的整体结构来加快收敛速度​​。我们基于损失函数的不同选择(例如平方损失和交叉熵损失)以及NN权重的正则化讨论了几种使用案例。我们对几个中型基准问题以及涉及模拟物理数据的大规模数据集进行了实验。结果表明,该算法的性能优于最先进的技术,可将收敛速度更快地降至最低。另外,我们展示了如何轻松地在多个计算单元上并行化算法而又不影响其性能。特别地,每个计算单元可以优化在输入变量的随机分配子集上定义的量身定制的替代函数,可以完全取决于可用的计算能力来选择其尺寸。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号